Efficient String Mining under Constraints Via the Deferred Frequency Index
نویسندگان
چکیده
We propose a general approach for frequency based string mining, which has many applications, e.g. in contrast data mining. Our contribution is a novel algorithm based on a deferred data structure. Despite its simplicity, our approach is up to 4 times faster and uses about half the memory compared to the best-known algorithm of Fischer et al. Applications in various string domains, e.g. natural language, DNA or protein sequences, demonstrate the improvement of our algorithm.
منابع مشابه
Impact of Pollution Location on Time and Frequency Characteristics of Leakage Current of Porcelain Insulator String under Different Humidity and Contamination Severity
One of the important factors influencing outdoor insulators performance is pollution phenomenon. The pollution, especially during humidity condition, reduces superficial resistance of insulator and lead to a flow of Leakage Currents (LC) on the insulator surface, which may result in total flashover. The LC characteristics are affected by parameters such as nature and severity of pollution. Loca...
متن کاملEfficient Optimum Design of Steructures With Reqency Response Consteraint Using High Quality Approximation
An efficient technique is presented for optimum design of structures with both natural frequency and complex frequency response constraints. The main ideals to reduce the number of dynamic analysis by introducing high quality approximation. Eigenvalues are approximated using the Rayleigh quotient. Eigenvectors are also approximated for the evaluation of eigenvalues and frequency responses. A tw...
متن کاملIntroducing Softness into Inductive Queries on String Databases
In many application domains (e.g., WWW mining, molecular biology), large string datasets are available and yet under-exploited. The inductive database framework assumes that both such datasets and the various patterns holding within them might be queryable. In this setting, queries which return patterns are called inductive queries and solving them is one of the core research topics for data mi...
متن کاملApproximate String Similarity Join using Hashing Techniques under Edit Distance Constraints
The string similarity join, which is employed to find similar string pairs from string sets, has received extensive attention in database and information retrieval fields. To this problem, the filter-and-refine framework is usually adopted by the existing research work firstly, and then various filtering methods have been proposed. Recently, tree based index techniques with the edit distance co...
متن کاملOptimal String Mining Under Frequency Constraints
We propose a new algorithmic framework that solves frequency-related data mining queries on databases of strings in optimal time, i.e., in time linear in the input and the output size. The additional space is linear in the input size. Our framework can be used to mine frequent strings, emerging strings and strings that pass other statistical tests, e.g., the χ-test. In contrast to the presented...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008